WindSeer: real-time volumetric wind prediction over complex terrain aboard a small uncrewed aerial vehicle

Real-time high-resolution wind predictions are beneficial for various applications including safe crewed and uncrewed aviation. Current weather models require too much compute and lack the necessary predictive capabilities as they are valid only at the scale of multiple kilometers and hours – much lower spatial and temporal resolutions than these applications require. Our work demonstrates the ability to predict low-altitude time-averaged wind fields in real time on limited-compute devices, from only sparse measurement data. We train a deep neural network-based model, WindSeer, using only synthetic data from computational fluid dynamics simulations and show that it can successfully predict real wind fields over terrain with known topography from just a few noisy and spatially clustered wind measurements. WindSeer can generate accurate predictions at different resolutions and domain sizes on previously unseen topography without retraining. We demonstrate that the model successfully predicts historical wind data collected by weather stations and wind measured by drones during flight.

Supplementary Note 2: NWP data as WindSeer input Our initial hypothesis was to train WindSeer based on the known high-resolution terrain and predictions from large scale numerical weather prediction (NWP).The Swiss COSMO 1 model provides predictions with a horizontal resolution of 1.1 km [2].The elevation data used in the NWP models, such as GLOBE [3], is an aggregation of available high-resolution terrain sources (usually the mean or median of the high-resolution data within one cell).This smoothed topography representation neglects smaller scale terrain features and therefore only provides meaningful results at a scale of multiple cells/kilometers.We conducted a test flight to evaluate how well the NWP of one cell matches the wind measured by an sUAV.We measured the wind at one grid point of the Swiss COSMO-1 model1 close to Flüelen (46°53' 33" N, 8°36' 45" E, 436 m above mean sea level (AMSL)).While Flüelen is located within the Swiss Alps, this particular test site is bordered on one side by a lake and surrounded by flat and smooth terrain within a 1 km radius resulting in a good match between the NWP terrain model and the high-resolution terrain.
As visible in Supplementary Figure 2 A), the COSMO-1 NWP poorly represents the sUAV data for both the magnitude and wind direction.Obviously in different conditions the NWP might fit the measurements better.However, this implies that, depending on the case, the NWP may or may not be accurate.Thus, WindSeer needs another, more reliable source for its wind prediction data.In addition, the coarse representation of the terrain can result in large altitude offsets of the NWP compared to the actual terrain in the presence of large elevation changes, see Supplementary Figure 2 B).
The NWP data may provide supplemental information to WindSeer if used together with the sparse measurements.However, first the mapping between the NWP data to the actual flow needs to be established.This would be a highly data-driven task, and if that connection is too noisy, WindSeer might learn to ignore the NWP input data altogether.Models trained with different pooling methods (max-pooling (MP), average-pooling (AP), convolution with strides) perform comparably with a slight edge for the pooling methods over the convolution with strides (1.1 % error reduction).The model using only the horizontal wind measurement (NUZ) outperforms the baseline (BL) model, which uses the vertical measurements as well, by 2.6 %.We also varied the input trajectory lengths up to a length of 500 cells (LT).Networks trained on longer trajectories perform 13.6 % better even if they are evaluated exclusively on short trajectories with lengths of up to 50 cells.

Input noise ablation study
Realistic wind measurements are subject to noise.We model the sensor noise with a zeromean Gaussian distribution and the sensor miscalibration with a constant bias.We evaluated the robustness of the model to different levels of such noisy input.Doing so we trained multiple models (BL architecture) with varying levels of input noise.We varied the standard deviation of the Gaussian noise between 0 % and 80 % of the average flow magnitude of the respective sample; we varied the bias between 0 % and 50 % of the flow magnitude.We then evaluated the models in two ways: First we evaluated them on the test set with the same noise distribution they observed during training.Since this is not a fair comparison, as predicting with high-noise levels is more difficult than low-noise data, we also evaluated all models against perfect data (no noise added).The results of the experiment are displayed in Supplementary Figure 3 A).In general, higher input noise indicates higher prediction errors, but up to a level of 10 % Gaussian noise and bias we observed similar errors.When evaluating the models on the perfect input we can see that the low-level noise models (up to 10 % bias and 30 % Gaussian noise) perform comparably to the baseline model trained without noise (Supplementary Figure 3 B)).Thus training models with a too high noise level will also negatively impact the performance when they are provided with perfect input data.

CFD prediction results of different WindSeer variants
In our evaluation we considered six variations of WindSeer [ZD4, ZD6, AD4, AD6, VD4, VD6] by varying the fill value and network depth.The fill indicator (Z, A, V) indicates how the wind speed input channels are filled for the cells with no measurements.We tested fill values of: zero (Z), the average of all measurements per channel (A), and the Voronoi tessellation presented in [4] (V) (essentially the nearest measurement value).The network depth indicates the number of pooling/upsampling layers in the encoder/decoder of WindSeer and we evaluated depths of four (D4) and six (D6) resulting in receptive field sizes of 175 and 703 respectively.The models were trained using the Adam optimizer [5] for 3000 epochs except for AD6, where the model after 1000 epochs was chosen as further training showed increasing validation loss, suggesting over-fitting.
We used the same input noise distribution as observed during training (Gaussian noise and random bias).Supplementary Figure 3 C) shows the distribution of the relative velocity magnitude and TKE prediction errors over the full flow domain on the left side (blue) and excluding the lowest four cells above the terrain on the right side (green).These latter  results (equivalent to only scoring the network output above an altitude of 46 m) illustrate the predictive performance for realistic sUAV flight regimes.There, all WindSeer variants produced more accurate wind velocity predictions (median error reduction AD4: 11.1 % to 8.3 %, AD6: 12.0 % to 8.9 %, ZD4: 12.2 % to 9.0 %, ZD6: 11.0 % to 8.1 %, VD4: 12.1 % to 8.0 %, VD6: 11.9 % to 7.6 %).In contrast to the velocity errors, the TKE predictions do not significantly change on the reduced prediction volume since the computed TKE values close to the terrain tend to be smoother than the velocity values, thus suffering less from resolution differences between WindSeer and the CFD simulations.All the WindSeer variants result in a similar median between 10.8 % to 11.9 %.Depending on the metric different models perform best.The averaging input models score the lowest mean error while the Voronoi variants yield the lowest median error.The zero-fill variants are most consistent with the lowest 75th percentile.
Overall, as evident in Supplementary Figure 3 C), there is no significant performance difference between the metrics of the WindSeer variants.However, a qualitative assessment of the predicted wind fields reveals that the Voronoi tessellation models (VD4 and VD6) show strong artifacts along the partition borders in certain cases.In contrast, the other WindSeer variants either do not exhibit such artifacts or are effected at a much smaller scale.These artifacts are a result of the noisy measurements being close to each other resulting in large differences between the Voronoi partitions.In Supplementary Figure 4 we show one such case and the resulting predictions for the VD4 and AD4 WindSeer variants.These results indicate that while Voronoi tessellation has been shown to work with sparse input data for flow prediction [4], in our setting with highly noisy measurements from only a small sub-region of the domain, this input modality can result in predictions containing artifacts.Thus, we further evaluate the artifact-free WindSeer variants on the real wind data (AD4, AD6, ZD4, ZD6).

Measurement campaign results of different WindSeer variants
We evaluated the performance of the different WindSeer variants on the data from the Bolund, Askervein, and Perdigão data and report the prediction error and correlation averaged for certain wind cases, as in Tab. 4. The changes in the prediction grid as outlined in Section "Measurement campaign datasets" resulted in higher sparsity levels in the range of 3.5×10 −6 % to 3.2×10 −5 % compared to the training density of 1.1×10 −3 % to 0.19 %.The model variants using average-filling (AD4, AD6) could generalize to this much sparser input data in contrast to the zero-fill models (ZD4, ZD6), which severely underpredicted the wind regardless of the measurement values.Thus, we only compared the two performant WindSeer variants against an averaging baseline (AVG) that assumes the wind and TKE are constant and predicts the average of all measurements over the full domain and report the prediction errors and correlations in Tab. 4. The AD4 variant resulted in better wind magnitude predictions, while the AD6 predicted the vertical wind better.The TKE is predicted with a lower error with the AD6 variant but also with lower correlation values compared to AD4.Overall, in most cases both WindSeer variants performed similarly to each other, explaining the small difference in the averaged error over all cases for the three metrics.We chose the AD4 variant as our WindSeer model as it did not show the over-fitting during training.Supplementary Table 3 sUAV flight results: The mean absolute error and the correlation for the horizontal wind magnitude W hor , wind direction Ψ hor , and vertical wind Wz on the loiter-averaged data.The results are the average over all loiters for all planes for the respective flight.The best performing model for each case is highlighted bold.

sUAV results of different WindSeer variants
We present the error metrics for all flights and the different model variants in Tab. 3.All models consistently struggle at predicting the horizontal wind while the vertical wind prediction is much more accurate compared to the baseline.The average-filling models strongly rely on the averaged measurement, thus providing a good representation of the overall flow state.However, in the flight experiments with complex topography, the measured wind does not have this property, therefore the zero-fill models (ZD4, ZD6), that learnt to better encode the measurement locations, outperform the average-filling (AD4, AD6) variants.In this set of experiment we see a slight trend that increasing network depth seems to improve the prediction quality.
where the p variables are free parameters.For the wind tunnel validation, the parameters were estimated by minimizing the mean squared error (MSE) between the sensor measurements and ground truth airflow angles (orientation of the aircraft using a tunnel-mounted sting, assumed to have very low angular position error).Fitting the wind tunnel data, the base functions result in an MSE for the AoA of 0.45 • and 0.83 • for the AoS.However, due to variations in mounts and aircraft, this calibration could not be performed for every sensor installation.Thus, we further defined a calibration routine to estimate the parameters of Eq. 1, 2 based on data gathered during a calibration flight, removing the need to calibrate every sUAV with wind tunnel data.The underlying assumptions that ensure the parameters are observable are that the horizontal wind is piecewise constant and that there is no vertical wind during the calibration flight (calibration flights were performed in as calm flight conditions as possible, usually early morning).We also assume the estimated attitude and global position/velocity are accurate.To cover the different flight regimes our calibration flight consisted of counter-clockwise and clockwise loiter circles of different radii ranging from 30 m to 100 m flown at different airspeeds (10 m s −1 to 16 m s −1 ).We then solved for the calibration parameters and the wind (Wx, Wy) by minimizing the error using a nonlinear least-squares solver in the wind triangle over the full flight: where R (Θ, Φ, Ψ) is the rotation matrix based on the current attitude, v Gnd the estimated ground speed vector, and ω (.) the rotational speed around the respective axis.The offset from the vanes to the autopilot origin is denoted by l x,β , l z,β , lx,α, and ly,α.
Using the uncalibrated measurements from the airflow sensors results in strong oscillations of the estimated horizontal wind (strongly correlated to the loiter frequency) and a vertical estimate with a non-zero mean as visible in Supplementary Figure 7 A) as the sensors are located within the disturbed flow from the wing and airframe.The piecewise linear horizontal and zero-mean vertical wind fit as a result of the airflow calibration pipeline are shown Supplementary Figure 7 B).Although there is no constraint on the difference between the segments in the horizontal wind, the changes are relatively small.This stable, near-constant wind (magnitude and direction) reflects the forecast and observations made from the ground during the flight.The calibration reduces the estimated oscillations in the wind significantly and results in accurately measuring the zero-mean vertical wind (Supplementary Figure 7 C)).However, some correlation between the wind estimates and the loiter frequency remain, indicating that the calibration function could still be improved.

Calibration quality
In contrast to the calibration flights, in the actual data collection flights, the wind estimates from the sUAVs again show some oscillations strongly correlating with the loiter patterns in flight.However, for these data collection flights we expect the wind to vary across different locations so this could be correctly observed changes in the wind field.In Supplementary Figure 8 we display the binned wind observations from two loiters patterns flown at the same altitude next to each other.Especially for the horizontal wind measurements we and wn we can see the same pattern repeating for both loiters with an amplitude of about 1 m s −1 in each direction.This pattern indicates that we would expect errors in the horizontal measurements of about ±1 m s −1 , which are comparable to the observed variation of the measurements between the sUAV and within a single flight.For the vertical wind we do not see such repeating patterns, thus we expect a higher quality of these measurements.
The altitude for the calibration flight of 540 m above mean sea level compared to altitudes of the data collection flights (1600 m to 2200 m) results in 10.5 % to 15.1 % lower air densities at the higher altitudes.Previous work has shown that density changes result in changing flow fields [7,8].This could, in part, explain the difficulty to accurately calibrating the airflow sensing if they are located within the disturbed flow field of the air-frame.Therefore, for future flights, the sensors should be placed further away from the wings and fuselage to minimize the aerodynamic disturbances on the sensors, and calibration flights performed at the same altitude as test flights.where PATH TO DATASET is the path to the corresponding dataset file, PATH TO MODEL is the path to the model and TWR and CASE specify the tower and case to predict.The IDX must be set to 7 for the Askervein experiments, to 10 for Bolund and 6 for Perdigão respectively.To show the predictions for the averaging baseline add --baseline and to compute the data for the scatter plots add the --benchmark flag.The data for subfigure 6 D is obtained by executing the above command for every 5 minute and 1 hour averaged data case.

Figure 7
To recreate these figures the UAV flight data.zipfile (unzipped), the WindSeer source code, and the AD4 and ZD6 model folders are required.In addition you will need to obtain a geotiff of the Chasseral and Oberalppass region.The subfigures can be generated accordingly:

Supplementary Figure 2
(A) small uncrewed aerial vehicle (sUAV) wind measurements compared to the NWP along different flight altitudes showing the mismatch in direction and wind speed.(B) The coarse terrain representation in the NWP causes offsets in the prediction altitudes to the terrain.Source data are provided as a Source Data file.

Supplementary Figure 3 Supplementary Figure 4
Ablation study results on the test set containing 4764 samples.(A) Models trained with different levels of Gaussian noise and biases and evaluated with the same noise distribution used during training.(B) The same models as in A but evaluated without noise on the input data.(C) Wind magnitude and turbulence kinetic energy (TKE) relative prediction errors of the WindSeer variants on the CFD test set on the full domain (left, blue).In contrast to the velocity errors, excluding the closest cells to the terrain does not change the prediction error (right, green) for the TKE.Boxes extend from the first to the third quartiles of data.Median is indicated by a line and mean by a star.Whiskers extend to the extrema data inside 1.5 times the interquartile range beyond the first and third quartiles.Outliers (outside the whiskers) are individually plotted.Source data are provided as a Source Data file.(A) A horizontal slice through the domain showing the input to the Voronoi tessellation WindSeer variant (VD4).The measurement mask highlights the cells containing measurements in red (B) the resulting prediction containing strong artifacts along the cell edges especially on the horizontal wind.The CFD label flow and the AD4 prediction.The AD4 prediction still shows some artifacts due to the input measurements albeit with much smaller significance.

Supplementary Figure 8 A
top down view of the binned wind measurements for one sUAV for the first Chasseral flight for two loiters flown at the same altitude.

Table 1
Baseline hyperparameter set used in the ablation study.